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Gravitational- wave astronomers often wish to characterize the expected parameter-estimation accuracy of fu- 
ture observations. The Fisher matrix provides a lower bound on the spread of the maximum-likelihood estimator 
across noise realizations, as well as the leading-order width of the posterior probability, but it is limited to high 
signal strengths often not realized in practice. By contrast, Monte Carlo Bayesian inference provides the full 
posterior for any signal strength, but it is too expensive to repeat for a representative set of noises. Here I de- 
scribe an efficient semianalytical technique to map the exact sampling distribution of the maximum-likelihood 
estimator across noise realizations, for any signal strength. This technique can be applied to any estimation 
problem for signals in additive Gaussian noise. 
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The first direct detections of gravitational-wave (GW) sig- 
nals are likely to be achieved in the second half of this decade 
with second-generation ground-based interferometric detec- 
tors such as Advanced LIGO and Virgo Q, which are sensi- 
tive to high-frequency GWs (10-1000 Hz). Third-generation 
instruments such as the Einstein Telescope [2] promise much 
greater reach and yield, while space-based observatories sim- 
ilar to LISA 13 will be sensitive to low-frequency GWs 
(10 _5 -10 _1 Hz) in a band populated by thousands of de- 
tectable sources in the Galaxy and far beyond. A survey of 
the scientific literature on GW data analysis in this prede- 
tection era reveals a few dominant genres: articles on data- 
analysis methods and implementations [e.g., 4]; nondetection 
and upper-limit analyses of actual collected data [e.g.,|5]; and 
"prospects" papers that examine combinations of GW sources 
and detectors to characterize the expected rates of detection 
and accuracies of source-parameter estimation [e.g.,|6]|. 

Papers in this last class, especially when they are concerned 
with Bayesian inference for modeled GW signals, follow a 
well-rehearsed structure: a) derive the GW signal h as a func- 
tion of the source parameters ; b) fix fiducial true values 6 tr 
for them; c) simulate a realization n of detector noise, usu- 
ally assumed as additive and Gaussian; d) derive the noise- 
dependent probability distribution p(B\n) for the source pa- 
rameters given the data s = h ir + n\ e) characterize the error 
and uncertainty of p(6;n); f) finally, repeat steps c, d, and 
e for a sufficiently broad sample of noise realizations, since 
each of these will result in different parameter estimates and 
uncertainties. Optimally, repeat for different fiducial t r- 

Step d [calculating p(Q;ri)] is usually very computation- 
ally expensive: for more than two or three source parameters, 
in Bayesian inference it must be performed with a stochas- 
tic technique such as Markov chain Monte Carlo integration 
ITlL which may involve ~ 10 6 evaluations of the likelihood 
for different 6s. Each evaluation requires obtaining the GW 
signal h(6) and performing an FFT (say, for 10 5 points) to 
compute the likelihood of the noise residual [see the discus- 
sion around Eq. ([]])] . Step f is therefore a Monte Carlo integra- 
tion of Monte Carlos integrations (perhaps again 10 6 of them), 
with a total computational cost, for a single tY , of ~ 10 18 



floating-point operations, more than can usually be procured 
easily. More emphatically, I call this scheme the Holy Grail 
of parameter-estimation prospects: even the papers that try 
hardest [e.g., [8), perform meta-Monte Carlo studies of ~ 100 
combinations of noise realizations and fiducial sources. 

Most "prospects" papers, instead, take advantage of the fact 
that steps d to f can be short-circuited when the signal-to-noise 
ratio (SNR) of detection is sufficiently high that p(0;ri) col- 
lapses to a normal distribution centered around the maximum- 
likelihood parameters 0mi(w), with covariance given by the 
inverse Fisher matrix, which is not a function of n. At high 
SNR, the distribution of 6 m \ for different noise realizations is 
itself normal and centered at 6 tr , with covariance again given 
by the inverse Fisher matrix. The computational cost of this 
approximation (for a single 6 tr and 10 5 -point likelihoods) is 
~ d 2 x 10 6 floating-point operations, where d is the number of 
source parameters, affording virtually instantaneous results. 

Unfortunately, as I discuss at length in Ref. [9] and as 
highlighted elsewhere OHO), for many practical parameter- 
estimation problems SNRs will not be sufficiently high to jus- 
tify the approximation. The crux of the problem is that the 
Fisher matrix, built from the partial derivatives dh/d0i of the 
signal, can represent h correctly only if h is linear in all the 
0i across ranges comparable to the expected parameter errors. 
These decrease as the SNR grows, making the condition less 
stringent; in Ref. (9), I provide a criterion to determine when 
the SNR is high enough. 

Thus, in general a Monte Carlo integration is needed for 
step d; furthermore, the meta-Monte Carlo integration of step f 
is also necessary, because the particular noise realization does 
affect parameter uncertainties, as shown in Fig. [2] for the toy 
model discussed later in this Letter. It is true ifTTIl that for 
additive Gaussian noise the average of p(B\n) across all n's 
equals p(6;n = 0); but this averaged likelihood describes an 
unrealistic limit (infinite observations of the same source with 
a finite total SNR) that is not representative of average or typ- 
ical errors, except trivially in the high-SNR limit. Thus, the 
shape and width of generic p(Q\n) cannot generally be ob- 
tained from p(0;n = 0), as suggested in Ref. Q2J. In Ref. (51 
I derive an expansion of p(6;n) in powers of 1/SNR, where 
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^-dependent terms are seen at next-to-leading order for both 
the centering and shape of the posterior. 

In this Letter I describe and test a technique to perform as 
less thorough, but more affordable survey than a nested Monte 
Carlo: mapping the sampling distribution of the maximum- 
likelihood (henceforth, ML) estimator m i [the maximum of 
each p(6;n)] over all noise realizations. From a classical- 
statistics (a.k.a. "frequentist") viewpoint, the spread of this 
distribution is formally the uncertainty of the ML point esti- 
mator. From a Bayesian viewpoint fT3lL if priors are unimpor- 
tant, the spread of this distribution describes one major com- 
ponent of the expected error (the other is the intrinsic spread 
of the posterior for each n). 

Mapping the sampling distribution of the ML estimator. 
We consider a large set of experiments in which we observe 
the signal h(6 tr ), where 0{ r is the d-dimensional vector of true 
parameter values and h is an TV-dimensional vector (e.g., a 
time series) with N ^> d. In each experiment, the detector 
output is s = n + h(0ir) 9 where n is additive Gaussian noise 
distributed with p(n) = ^exp — (n,ri)/2. Here (•,•) is the 
standard signal inner product, given in one convention (6) by 
(s, t) = 4 Re / °° s* (f)t(f) /S(f) df, with ~ denoting the Fourier 
transform and * the complex conjugate. In this Letter, with- 
out loss of generality, we treat (•, •) as the inner product of an 
abstract linear space, and let \s\ 2 = (s,s). 

For a given tr and noise realization n tr , the ML estimator 
0mi(^tr, 0tr) maximizes the probability of the residual noise n 
obtained by subtracting the postulated signal h(6) from the 
data s = h(0ir) + n ir \ that is, it maximizes 

p(n = s- h{6)) oc e-IMW+« tr -Me)| 2 /2. (1) 

Thus, 6 m \ must satisfy the vector equation 

ML i (0;n,e tr ) = (d i h(G),h(0 tr )+n-h(6))=0, (2) 

where the ML ; are the partial derivatives of —2 log p. 

Our purpose is to map the distribution of 6 m \ across all 
noise realizations. We can do this by enumerating all possible 
n's [weighted by p(n)], figuring out the 6^ corresponding to 
each, and accumulating the resulting distribution of 0^. For- 
mally, 

P (o ml = e\e tv ) = J s(e ml (n,e tr )-e) P (n)dn. (3) 

Unfortunately, because h(6) is generally a complicated func- 
tion of the 6\ it is difficult to solve ML/ = for m i given n, 
so we can only integrate Eq. ([3} by using a Monte Carlo ap- 
proach where we generate full, high-dimensional realizations 
of n (e.g., as time series), search parameter space for 6 m \ by 
repeatedly evaluating Eq. (p}, and iterate for many different n. 

The main result of this paper is that there is a more effec- 
tive way to map p(0 m \)\ we enumerate the 6 m \, and com- 
pute the total probability weight of the n that are compat- 
ible with each. (Put slightly differently, for each 6 m \ we 
count how many experiments would yield it as the maxi- 
mum of likelihood.) Because the ML; are linear functions 
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FIG. 1. To compute p(0 m \), we integrate p(n) over all noise re- 
alizations that satisfy ML/ = — see the discussion below Eq. (4). 
Here/*(0) = (cos 0, sin 6),p(n x ,n y ) = (27r) _1 exp — (w 2 +^)/2, and 
0ml = arctan(s y /s x ). The resulting p(6 m \\6 tY = 0) is shown top right. 

of the n, this weight is the probability mass of the (N — tri- 
dimensional subspace of the noise vectors that solve ML ; = 
for all i. Using the 5-of-function relation for the vector ML^, 
S(ML t (9;nA)) = S(MMtr) - 0)/\dML i (0)/dO j \, we 
can then rewrite p(6 m \ = 0|0t r ) as 

JV J n k 8(ML k (0;n,dtr)) x \dMLi/d0j\ e~^ /2 dn. (4) 

Figure [T] exemplifies the process of integrating over the ft's 
compatible with a chosen 6 m \. In this low-dimensional model 
fBlL detector data are described by a point in the plane, and 
the signal family h(6) by the unit circle. For each n, the true 
signal /ztme is displaced to a different s = h tme + n with proba- 
bility p(n) indicated by the shading. Projecting back to h(6) 
identifies the ML waveform h^ and parameters m i. All noise 
realizations that produce an s within the gray sector project to 
the same h^, so integrating p(n) over the sector yields /?(0mi). 

Equation ^ is especially powerful because, contrary to ap- 
pearance, it does not require integration over the full N dimen- 
sions of the noise, but only over ~ d 2 coordinate directions 
corresponding to the ML^ and dMLi/dOj. Since both these 
sets of functions are linear in n, they can be seen as jointly 
normal random variables that are fully characterized by their 
inner products; all the other noise degrees of freedom have no 
effect on the integral other than its normalization. (We spell 
this out in detail in the next section.) With Eq. ^ in hand, 
we can then sample /?(0mi) directly (for low d), or by Markov 
chain Monte Carlo techniques. 

Evaluating the master integral. Equation ([4} can be evalu- 
ated elegantly as the expectation value of a function (the deter- 
minant) of correlated random variables. For clarity, we follow 
a more pedestrian approach: we begin by transforming the N- 
dimensional integral over the n to new coordinates where the 
first few basis vectors span the random variables of interest. 
Namely, we write n in terms of a new basis where the first d 
vectors are obtained by orthonormalizing the djh = hi, 

h\ochi, hioc h-^hj^hjyhi\ (5) 
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FIG. 2. Likelihood maps obtained for the toy model with differ- 
ent n's and with the same (0£ tr ,./trX located at the origin of the plots. 
Contours are plotted at levels that would correspond to 1-cr, 2-o\ and 
3-ct ellipses for a normal distribution. The signal s = h + n is shown 
above each map. 

furthermore, we obtain the (d + l)-th to d(d + 3) /2-th basis 
vectors by orthonormalizing the c^/z = with i > j; the re- 
maining N — d(d + 3)/2 noise combinations complete the ba- 
sis. 

We can now rewrite all the variables that appear in Eq. ^ 
in terms of the new basis: 



h 



7 j 



(6) 



here the AT, Cf , and C£ are the coefficients that expand the n, 
hi, and respectively, in terms of the h k . We use Einstein 
summations, and treat (ij) both as a double index and as a 
single index flattened to the range ( d + 1) , . . . , d( d + 3)/2. By 
virtue of orthonormalization, Cf = for k > i and Cfj = for 

k > (ij), with C\ > and cfj ;) > 0. Furthermore, 



ML, 
MLy 



C$N k -Ct(Ah,n k ), 

C%(Ah : h m )-CfC jk , 



■C m -N> 



(7) 



with Ah = h(6) — h(6 tr ), and with the sums over k limited to 
k < i and the sums over m limited to m < (ij). 

To perform the integration in Eq. ([?]), we use the first 5 to fix 
N\ = (Ah, h\), yielding a 5 -normalization factor of 1/Cj; we 
use the second 5 to fix N2 = (A/z,%) (after the terms propor- 
tional to N\ in the 5 cancel out), yielding a factor of l/Cf ; and 
so on. Next, we perform the trivial integral over the "signal- 
orthogonal" noise degrees of freedom that correspond to the 
h k with k > d(d + 3)/2, leaving 



p(e ml = e\e tr ) 

1 



C m -N 



e -Lf =l (AhA) 2 /2 

: (2n) d ^)l A n d k=l C k k 
-C%(Ah,h m )-CfC jk 



-N m N m /2 



dN m 



(8) 



here the sums over m span [15] m = {d+ 1), . . . , d(d + 3)/2, 
and the sum over k spans k = 1 , . . . , d. 

For a given 6, the main computational cost of evaluating 
Eq. §8§ resides in the orthonormalization and in the compu- 
tation of the (Ah,h m ), which together require ~ d 4 /8 inner 
products (and therefore Af-dimensional signal FFTs); by con- 
trast, the d(d + 1 ) /2-dimensional Gaussian integral can be 
evaluated much more cheaply (e.g., by a 10,000-point Monte 
Carlo integration over N m drawn from a normal distribution), 
since the integrand is a function of small matrices and not long 
FFTs. 

Fisher-matrix limit. The well-known high-SNR, Fisher- 
matrix limit, in which the waveform can be approximated 
as a linear function of the parameters (and in which the 
6^ have a simple jointly-normal distribution), follows eas- 
ily by specializing Eq. Without loss of generality, we 
set h(0) = Q l hi, it follows that h tJ = C\- = 0. The integra- 
tion over the N m is trivial, and yields (27t) d ^ d ~ l ^ 4 \CfC jk \ 9 
where CfCjk = (h^hj) = Fij is the Fisher matrix. Further- 
more, Ah = h A6 l , where A6 l = 6 r 



ml 



6b is the error of the 



ML estimator, so the first exponential of Eq. ([8]) can be rewrit- 
ten as exp(— A6 l F^ AO j /2), since the h k , for k = 1, . . . ,d, span 
a complete basis for the h\. Last, because of the structure of 

the Cf, n d =l C k k = y/\CfC jk \ = v^f. Taking everything to- 
gether, we reproduce the Fisher-matrix result for the distribu- 
tion of 6 m \ fT6lL 



p(flU|Qr) 
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(2nY\FiJ 



(9) 



The general result [Eq. ([8])] can be restated in terms of 
in a form more suitable to computation: 

e -(Ah,hi)(F- l yj(Ah,hj)/2 

P(0mi = 0\0tv) 



/1 



-(Ah,hij)-M {ij) \ 



(27t)d(d-i)/2\ D ^ v 

Mn{D~ l Y V 'My j! 



dM„; (10) 



here D^y = D^n^ is the d(d — 1) /2-dimensional square 
matrix given by the products (h'^ph' kl ) with j < i, I < t. 
the primes denote projection orthogonal to h k (i.e., ti { ■ = 

Ld+i Cijhjc), an d ^(ij) is me matrix obtained from the 
d(d—l) /2-dimensional vector of integration variables by 
remapping indices. 

Toy model. To exemplify the use of Eqs . ^ to map p(Q m \) 
for low-SNR parameter estimation, let us consider a family of 
sine-Gaussian signals given by 



h(t\A,aJ) =Ae~ t2/2a2 sin(2n ft). 



(11) 



We consider the problem of jointly estimating a and /, but 
not A, which we fix to yield SNR = 5. For our example, we 
select a tr = 1 and f tr = 0.25. As shown in Fig.[2j at this low 
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FIG. 3. Exact (left) and Fisher-matrix (right) sampling distribution of 
6 m \. Contours are plotted as in Fig. [2] with the noisy white curves and 
shading derived from the numerical maxima of 100,000 likelihood 
maps for different n's, and the dark curves from Eq. ([4]). The dot 
marks the most probable 6 m \, slightly displaced from 6 tY . 

SNR different noise realizations yield strikingly different like- 
lihood maps for the same true signal — all quite different from 
the ellipsoidal Fisher prediction. In each map, the pale blue 
dot marks the location of 6 m \ — indeed, the distribution of blue 
dots is just the desired p(0 m \). 

In Fig. [3] I show maps of p(6 m \) obtained by using both 
the brute-force approach [Eq. ([3])] and the new method [Eq. 
Q]. For the former (the noisy white curves and shading), I 
produced 100,000 likelihood maps p(6',n) with different n's 
drawn from p(n), and 2D-histogrammed the resulting 6 m \. For 
the latter (dark curves), I used Eq. ([?]), as implemented by 
Eq. ([8]). As expected, the maps agree, but the new method is 
considerably faster. For comparison, the top-right plot shows 
the Fisher-matrix prediction ([9]). 

Conclusions. I have described a novel approach to create 
exact maps, for any SNR, of the distribution of the ML estima- 
tor for the source parameters of a signal embedded in additive 
Gaussian noise. This distribution would be obtained in a large 
set of observations of the same true signal with different noise 
realizations, each appearing with probability p(n). Given a 
single observation, such a map embodies the frequentist no- 
tion of uncertainty for the ML estimator. From a Bayesian 
viewpoint, if priors are unimportant, the map characterizes the 
distribution of possible maxima of posterior probabilities. 

In comparison to the computational cost of the "Holy Grail" 
nested Monte Carlo (10 18 operations), we estimate the cost 
of Eq. ([4]) as follows: 10 6 Monte Carlo samples of can- 
didate m is, times d 4 /8 inner products, times 10 6 floating- 
point operations for each inner-product FFT (again assuming 
N = 10 5 ); thus, even for d = 10, this scheme involves ~ 10 15 
operations — a thousand times cheaper. 

These maps can be used directly, in both frequentist 
and Bayesian frameworks, to study parameter-estimation 
prospects, but also to perform stringent tests of Fisher-matrix 
predictions at low SNRs, and to provide proposal distributions 



for Monte Carlo searches of unknown sources. An interesting 
feature in this regard is that ML ; = 0, a local condition, does 
not distinguish between primary and secondary maxima of the 
likelihood, and it will include the latter (if they are sufficiently 
probable) in the maps; thus Eq. ^ could be exploited to en- 
able jumps between separated peaks in complex likelihoods, 
which are generally very difficult to locate. While this result 
was derived in the context and with the motivation of GW sci- 
ence, it is applicable to statistical inference for any problem 
where noise can be regarded as Gaussian and additive, such 
as several that arise in high-energy physics and observational 
cosmology. 
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